167 research outputs found
Utilisation de la langue naturelle pour l'interrogation de documents structurés
http://www.asso-aria.org/coria/2005/19.pdfInternational audienceLe langage de requĂȘte est l'indispensable interface entre l'utilisateur et l'outil de recherche. SimplifiĂ© au maximum dans les cas oĂč les moteurs indexent essentiellement des documents plats, il devient fort complexe lorsqu'il s'adresse Ă des documents structurĂ©s et qu'il s'a git de dĂ©finir des contraintes portant Ă la fois sur la structure et le contenu. L'approche ici- dĂ©crite propose d'utiliser la langue naturelle comme interface pour exprimer de telles requĂȘtes. L'article dĂ©crit dans un premier temps les diffĂ©rentes phases qui permettent de transformer (dans un cadre de recherche d'information) la requĂȘte en langage naturel en une reprĂ©sentation sĂ©mantique indĂ©pendante du contexte. Des rĂšgles de simplification adaptĂ©es Ă la structure et au domaine du corpus sont ensuite appliquĂ©es, permettant d'obtenir une forme finale, adaptĂ©e Ă une conversion ver s un langage de requĂȘte formel. L'article dĂ©crit enfin les expĂ©rimentations effectuĂ©es et tir e les premiĂšres conclusions sur divers aspects de cette approche
Justification of Answers by Verification of Dependency Relations-The French AVE Task.
International audienceThis paper presents LIMSI results in Answer Validation Exercise (AVE) 2008 for French. We tested two approaches during this campaign: a syntax-based strategy and a machine learning strategy. Results of both approaches are presented and discussed
Supervised Machine Learning Techniques to Detect TimeML Events in French and English
International audienceIdentifying events from texts is an information extraction task necessary for many NLP applications. Through the TimeML specifications and TempEval challenges, it has received some attention in the last years; yet, no reference result is available for French. In this paper, we try to fill this gap by proposing several event extraction systems, combining for instance Conditional Random Fields, language modeling and k-nearest-neighbors. These systems are evaluated on French corpora and compared with state-of-the-art methods on English. The very good results obtained on both languages validate our whole approach
Question Generation for French: Collating Parsers and Paraphrasing Questions
This article describes a question generation system for French. The transformation of declarative sentences into questions relies on two different syntactic parsers and named entity recognition tools. This makes it possible to further diversify the questions generated and to possibly alleviate the problems inherent to the analysis tools. The system also generates reformulations for the questions based on variations in the question words, inducing answers with different granularities, and nominalisations of action verbs. We evaluate the questions generated for sentences extracted from two different corpora: a corpus of newspaper articles used for the CLEF Question Answering evaluation campaign and a corpus of simplified online encyclopedia articles. The evaluation shows that the system is able to generate a majority of good and medium quality questions. We also present an original evaluation of the question generation system using the question analysis module of a question answering system
Ăvaluation de la contextualisation de tweets
National audienceCet article s'intéresse à l'évaluation de la contextualisation de tweets. La contextualisation est définie comme un résumé permettant de remettre en contexte un texte qui, de par sa taille, ne contient pas l'ensemble des éléments qui permettent à un lecteur de comprendre tout ou partie de son contenu. Nous définissons un cadre d'évaluation pour la contextualisation de tweets généralisable à d'autres textes courts. Nous proposons une collection de référence ainsi que des mesures d'évaluation adhoc. Ce cadre d'évaluation a été expérimenté avec succÚs dans la contexte de la campagne INEX Tweet Contextualization. Au regard des résultats obtenus lors de cette campagne, nous discutons ici les mesures utilisées en lien avec les autres mesures de la littérature
Overview of INEX Tweet Contextualization 2013 track
International audienceTwitter is increasingly used for on-line client and audience fishing; this motivated the tweet contextualization task at INEX. The objective is to help a user to understand a tweet by providing him with a short summary (500 words). This summary should be built automatically using local resources like the Wikipedia and generated by extracting relevant passages and aggregating them into a coherent summary. The task is evaluated considering informativeness which is computed using a variant of Kullback-Leibler divergence and passage pooling. Meanwhile effective readability in context of summaries is checked using binary questionnaires on small samples of results. Running since 2010, results show that only systems that efficiently combine passage retrieval, sentence segmentation and scoring, named entity recognition, text POS analysis, anaphora detection, diversity content measure as well as sentence reordering are effective
Overview of INEX Tweet Contextualization 2014 track
International audience140 characters long messages are rarely self-content. The Tweet Contextualization aims at providing automatically information - a summary that explains the tweet. This requires combining multiple types of processing from information retrieval to multi-document sum- marization including entity linking. Running since 2010, the task in 2014 was a slight variant of previous ones considering more complex queries from RepLab 2013. Given a tweet and a related entity, systems had to provide some context about the subject of the tweet from the perspective of the entity, in order to help the reader to understand it
Impact of translation on biomedical information extraction from real-life clinical notes
The objective of our study is to determine whether using English tools to
extract and normalize French medical concepts on translations provides
comparable performance to French models trained on a set of annotated French
clinical notes. We compare two methods: a method involving French language
models and a method involving English language models. For the native French
method, the Named Entity Recognition (NER) and normalization steps are
performed separately. For the translated English method, after the first
translation step, we compare a two-step method and a terminology-oriented
method that performs extraction and normalization at the same time. We used
French, English and bilingual annotated datasets to evaluate all steps (NER,
normalization and translation) of our algorithms. Concerning the results, the
native French method performs better than the translated English one with a
global f1 score of 0.51 [0.47;0.55] against 0.39 [0.34;0.44] and 0.38
[0.36;0.40] for the two English methods tested. In conclusion, despite the
recent improvement of the translation models, there is a significant
performance difference between the two approaches in favor of the native French
method which is more efficient on French medical texts, even with few annotated
documents.Comment: 26 pages, 2 figures, 5 table
Good practices for clinical data warehouse implementation: a case study in France
Real World Data (RWD) bears great promises to improve the quality of care.
However, specific infrastructures and methodologies are required to derive
robust knowledge and brings innovations to the patient. Drawing upon the
national case study of the 32 French regional and university hospitals
governance, we highlight key aspects of modern Clinical Data Warehouses (CDWs):
governance, transparency, types of data, data reuse, technical tools,
documentation and data quality control processes. Semi-structured interviews as
well as a review of reported studies on French CDWs were conducted in a
semi-structured manner from March to November 2022. Out of 32 regional and
university hospitals in France, 14 have a CDW in production, 5 are
experimenting, 5 have a prospective CDW project, 8 did not have any CDW project
at the time of writing. The implementation of CDW in France dates from 2011 and
accelerated in the late 2020. From this case study, we draw some general
guidelines for CDWs. The actual orientation of CDWs towards research requires
efforts in governance stabilization, standardization of data schema and
development in data quality and data documentation. Particular attention must
be paid to the sustainability of the warehouse teams and to the multi-level
governance. The transparency of the studies and the tools of transformation of
the data must improve to allow successful multi-centric data reuses as well as
innovations in routine care.Comment: 16 page
Utilisation de la syntaxe pour valider les réponses à des questions par plusieurs documents.
National audienceCet article présente FIDJI, un systÚme de questions-réponses pour le français, combinant des informations syntaxiques sur la question et les documents avec des techniques plus traditionnelles du domaine, telles que la reconnaissance des entités nommées et la pondération des termes. Nous expérimentons notament dans ce systÚme la validation des réponses dans plusieurs documents, ainsi que des techniques spécifiques permettant de répondre à différents types de questions (comme les questions attendant des réponses multiples (liste) ou une réponse booléenne)
- âŠ